145 research outputs found

    Teaching Data Science. Constructing Pillars in a Fluid Field

    Get PDF

    A Study of Four Index Structures for Set-Valued Attributes of Low Cardinality

    Full text link
    We review and study the performance of four different index structures for indexing set-valued attributes designed to speed up set equality, subset and superset queries. All index structures are based on traditional techniques, namely signatures and inverted files. More specifically, we consider sequential signature files, signature trees, extendible signature hashing, and a B-tree based implementation of inverted lists. The latter is refined by a compression scheme in order to keep space requirements within acceptable bounds. The performance study is based on real implementations subjected to a benchmark accounting for different set sizes, domain sizes, and data distributions (uniform and skewed)

    Evaluation of Main Memory : Join Algorithms for Joins with Set Comparison Predicates

    Full text link
    Current data models like the NF2 model and object-oriented models support set-valued attributes. Hence, it becomes possible to have join predicates based on set comparison. This paper introduces and evaluates several main memory algorithms to evaluate efficiently this kind of join. More specifically, we concentrate on the set equality and the subset predicates

    Compiling Away Set Containment and Intersection Joins

    Full text link
    We investigate the effect of query rewriting on joins involving set-valued attributes in object-relational database management systems. We show that by unnesting set-valued attributes (that are stored in an internal nested representation) prior to the actual set containment or intersection join we can improve the performance of query evaluation by an order of magnitude. By giving example query evaluation plans we show the increased possibilities for the query optimizer

    Indexing a Fuzzy Database Using the Technique of Superimposed Coding - Cost Models and Measurements

    Full text link
    Recently, new applications have emerged that require database management systems with uncertainty capabilities. Many of the existing approaches to modelling uncertainty in database management systems are based on the theory of fuzzy sets. High performance is a necessary precondition for the acceptance of such systems by end users. However, performance issues have been quite neglected in research on fuzzy database management systems so far. In this article they are addressed explicitly. We propose new index structures for fuzzy database management systems based on the well known technique of superimposed coding together with detailed cost models. The correctness of the cost models as well as the efficiency of the index structures proposed is validated by a number of measurements on experimental fuzzy databases

    Inferring offline hierarchical ties from online social networks

    Get PDF
    Social networks can represent many different types of relationships between actors, some explicit and some implicit. For example, email communications between users may be represented explicitly in a network, while managerial relationships may not. In this paper we focus on analyzing explicit interactions among actors in order to detect hierarchical social relationships that may be implicit. We start by employing three well-known ranking-based methods, PageRank, Degree Centrality, and Rooted-PageRank (RPR) to infer such implicit relationships from interactions between actors. Then we propose two novel approaches which take into account the time-dimension of interactions in the process of detecting hierarchical ties. We experiment on two datasets, the Enron email dataset to infer manager-subordinate relationships from email exchanges, and a scientific publication co-authorship dataset to detect PhD advisor-advisee relationships from paper co-authorships. Our experiments show that time-based methods perform considerably better than ranking-based methods. In the Enron dataset, they detect 48% of manager-subordinate ties versus 32% found by Rooted-PageRank. Similarly, in co-author dataset, they detect 62% of advisor-advisee ties compared to only 39% by Rooted-PageRank

    Early Grouping Gets the Skew

    Full text link
    We propose a new algorithm for external grouping with large results. Our approach handles skewed data gracefully and lowers the amount of random IO on disk considerably. Contrary to existing grouping algorithms, our new algorithm does not require the optimizer to employ complicated or error-prone procedures adjusting the parameters prior to query plan execution. We implemented several variants of our algorithm as well as the most commonly used algorithms for grouping and carried out extensive experiments on both synthetic and real data. The results of these experiments reveal the dominance of our approach. In case of heavily skewed data we outperform the other algorithms by a factor of two

    Nested Queries and Quantifiers in an Ordered Context

    Full text link
    We present algebraic equivalences that allow to unnest nested algebraic expressions for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries given in the XQuery language. Measurements illustrate the performance gains possible by our approach
    • …
    corecore